Performance and Scalability of Broadcast in Spark
نویسنده
چکیده
Although the MapReduce programming model has so far been highly successful, not all applications are well suited to this model. Spark bridges this gap by providing seamless support for iterative and interactive jobs that are hard to express using the acyclic data flow model pioneered by MapReduce. While benchmarking Spark, we identified that the default broadcast mechanism implemented in the Spark prototype is a hindrance toward its scalability. In this report, we implement, evaluate, and compare four different broadcast mechanisms (including the default one) for Spark. We outline the basic requirements of a broadcast mechanism for Spark and analyze each of the compared broadcast mechanisms under that guideline. Our experiments in high-speed, low-latency, and cooperative data center environments also shed light on characteristics of multicast and broadcast mechanisms in data centers in general.
منابع مشابه
An Overview of Group Key Management Issues in IEEE 802.16e Networks
The computer industry has defined the IEEE 802.16 family of standards that will enable mobile devices to access a broadband network as an alternative to digital subscriber line technology. As the mobile devices join and leave a network, security measures must be taken to ensure the safety of the network against unauthorized usage by encryption and group key management. IEEE 802.16e uses Multica...
متن کاملScalability Potential of BWA DNA Mapping Algorithm on Apache Spark
This paper analyzes the scalability potential of embarrassingly parallel genomics applications using the Apache Spark big data framework and compares their performance with native implementations as well as with Apache Hadoop scalability. The paper uses the BWA DNA mapping algorithm as an example due to its good scalability characteristics and due to the large data files it uses as input. Resul...
متن کاملExperimental Investigation on Hydrous Methanol Fueled HCCI Engine Using Spark Assisted Method
The present work investigates the performance and emission characteristics of hydrous methanol fuelled Homogeneous Charge Compression Ignition (HCCI) engine. In the present work a regular diesel engine has been modified to work as HCCI engine. Hydrous methanol is used with 15% water content in this HCCI engine and its performance and emission behavior is documented. A spark plug is used for ass...
متن کاملExperimental Study of Performance of Spark Ignition Engine with Gasoline and Natural Gas
The tests were carried out with the spark timing adjusted to the maximum brake torquetiming in various equivalence ratios and engine speeds for gasoline and natural gas operations. In thiswork, the lower heating value of gasoline is about 13.6% higher than that of natural gas. Based on theexperimental results, the natural gas operation causes an increase of about 6.2% brake special fuelconsumpt...
متن کاملDdup - towards a deduplication framework utilising apache spark
This paper is about a new framework called DeduPlication (DduP). DduP aims to solve large scale deduplication problems on arbitrary data tuples. DduP tries to bridge the gap between big data, high performance and duplicate detection. At the moment a first prototype exists but the overall project status is work in progress. DduP utilises the promising successor of Apache Hadoop MapReduce [Had14]...
متن کامل